1.模型訓練資料準備
準心資料如何標註: 以每張圖片中心點做計算以標註準心位置。
圖片來源: AIdea
程式實作邏輯
(1)將各類別有準心資料的圖片撈出
img_tag_Y_df_dict = {}
img_tag_Y_list_dict = {}
def img_tag_Y_list(plant_class):
"""
get img_name list w/ tag in eacg plant_class
"""
img_tag_Y_list_df = img_tag_Y[img_tag_Y["Img"].isin(img_name_plant_class[plant_class])]
img_tag_Y_df_dict[plant_class] = img_tag_Y_list_df
img_tag_Y_list_dict[plant_class] = img_tag_Y_list_df["Img"].tolist()
for plant_class in plant_class_list:
img_tag_Y_list(plant_class)
for k, v in img_tag_Y_list_dict.items():
print("{}: Numbers of crop_img = {}".format(k, len(v)))
(2)利用準心位置做正方形裁切
def crop_img_target(img_data, plant_class, img_name, img_shape, crop_length):
"""
get cropped img with raw img & target info
"""
img = img_data
img_tag_Y_plant_df = img_tag_Y_df_dict[plant_class]
## targey info(準心資訊)
target_df = img_tag_Y_plant_df[img_tag_Y_plant_df["Img"] == img_name]
target_y = target_df["target_y"]
target_x = target_df["target_x"]
## img shape info
img_h = img_shape[0] # height
img_w = img_shape[1] # width
orig_y = img_h/2 # origin of coordinates in each img(原點座標)
orig_x = img_w/2
## target location(準心位置)
aim_y = int(orig_x + target_y)
aim_x = int(orig_y + target_x)
## crop image with target
crop_h_lower, crop_h_upper = int(aim_y - crop_length), int(aim_y + crop_length)
crop_w_lower, crop_w_upper = int(aim_x - crop_length), int(aim_x + crop_length)
crop_img = img[crop_w_lower:crop_w_upper, crop_h_lower:crop_h_upper]
return img_name, crop_img
(3)對33個類別圖片做相同處理
Error_list = []
for plant_class, img_list in progress_bar(img_tag_Y_list_dict.items()):
for img_name in progress_bar(img_list):
img_data, img_shape = get_img_shape(plant_class, img_name)
try:
img_name_AA, crop_img_AA = crop_img_target(img_data, plant_class, img_name, img_shape_AA, 95)
save_crop_img(img_name_AA, crop_img_AA)
except Exception as ee:
Error_list.append(ee)
(4)裁切結果
訓練資料集全數解壓縮完畢,花費約4小時37分鐘。
前處理1st(所有有準心影像裁切),花費約1小時5分鐘,共取得19616張圖片。
目前挑戰:
(1)某些類別有準心的圖片數量太少。
(2)切割後圖片與準心標註位置(target_x, target_y)有落差。
心得小語:
今天總算能心無旁騖的專心coding,但持續間斷性地震也是人心惶惶。專案進度似乎比自己預期的還要慢,但另一方面也了解資料處理本就是會花最多時間的,多練習和思考總會越來越進步的吧 XD
想很久到底要不要跟朋友們分享報名這個競賽,太多優秀同儕呀,但也只有透過交流才會進步得更快吧~!! 決定完賽後在向各路優秀前輩和好友請益!
若看到文章的大大們,有任何建議再請不吝留言交流了(抱拳作揖,感謝!
今日工時: 50min*6
别被你腦袋裡的恐懼牽制,讓你心中的夢想指引你。
Don’t be pushed around by the fears in your mind. Be led by the dreams in your heart.
下禮拜見囉~